NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cross-modal Map Learning for Vision and Language Navigation

https://doi.org/10.1109/CVPR52688.2022.01502

Georgakis, Georgios; Schmeckpeper, Karl; Wanchoo, Karan; Dan, Soham; Miltsakaki, Eleni; Roth, Dan; Daniilidis, Kostas (June 2022, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

We consider the problem of Vision-and-Language Navigation (VLN). The majority of current methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or using cross-modal attention over the egocentric observations of the agent. In contrast to other works, our key insight is that the association between language and vision is stronger when it occurs in explicit spatial representations. In this work, we propose a cross-modal map learning model for vision-and-language navigation that first learns to predict the top-down semantics on an egocentric map for both observed and unobserved regions, and then predicts a path towards the goal as a set of way-points. In both cases, the prediction is informed by the language through cross-modal attention mechanisms. We experimentally test the basic hypothesis that language-driven navigation can be solved given a map, and then show competitive results on the full VLN-CE benchmark.
more » « less
Full Text Available
Cross-Modal Map Learning for Vision and Language Navigation

Georgakis, Georgios; Schmeckpeper, Karl; Wanchoo, Karan; Dan, Soham; Miltsakaki, Eleni; Roth, Dan; Daniilidis, Kostas (January 2022, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
From Spatial Relations to Spatial Configurations

Dan, Soham; Kordjamshidi, Parisa; Bonn, Julia; Bhatia, Archna; Cai, Jon; Palmer, Martha; Roth, Dan (April 2020, Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020))

Spatial Reasoning from language is essential for natural language understanding. Supporting it requires a representation scheme that can capture spatial phenomena encountered in language as well as in images and videos. Existing spatial representations are not sufficient for describing spatial configurations used in complex tasks. This paper extends the capabilities of existing spatial representation languages and increases coverage of the semantic aspects that are needed to ground spatial meaning of natural language text in the world. Our spatial relation language is able to represent a large, comprehensive set of spatial concepts crucial for reasoning and is designed to support composition of static and dynamic spatial configurations. We integrate this language with the Abstract Meaning Representation (AMR) annotation schema and present a corpus annotated by this extended AMR. To exhibit the applicability of our representation scheme, we annotate text taken from diverse datasets and show how we extend the capabilities of existing spatial representation languages with fine-grained decomposition of semantics and blend it seamlessly with AMRs of sentences and discourse representations as a whole.
more » « less
Full Text Available

Search for: All records